Automatic generation of visual scenarios for spoken corpora acquisition
نویسندگان
چکیده
The paper describes a system, in JAVA, for written and visual scenario generation to collect speech corpora in the framework of a Tourism Information System. Methods and experimental results are also presented for evaluating the degree of understanding of the proposed scenarios. The corpus generated from visual scenarios appears to be much richer than the one generated from textual descriptions.
منابع مشابه
Robust Extraction of Subcategorization Data from Spoken Language
Subcategorization data has been crucial for various NLP tasks. Current method for automatic SCF acquisition usually proceeds in two steps: first, generate all SCF cues from a corpus using a parser, and then filter out spurious SCF cues with statistical tests. Previous studies on SCF acquisition have worked mainly with written texts; spoken corpora have received little attention. Transcripts of ...
متن کاملHow Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies
The growing availability of spoken language corpora presents new opportunities for enriching the methodologies of speech and language therapy. In this paper, we present a novel approach for constructing speech motor exercises, based on linguistic knowledge extracted from spoken language corpora. In our study with the Dutch Spoken Corpus, syllabic inventories were obtained by means of automatic ...
متن کاملAutomatic Extraction of Subcategorization Frames from Spoken Corpora
We built a system for automatically extracting subcategorization frames (SCFs) from corpora of spoken language. The acquisition system, based on the design proposed by Briscoe & Carroll (1997) consists of a statistical parser, a SCF extractor, an English lemmatizer, and a SCF evaluator. These four components are applied in sequence to retrieve SCFs associated with each verb predicate in the cor...
متن کاملAutomatic lexicon generation and dialogue modeling for spontaneous speech
This paper describes novel framework for dialogue modeling based on a superword model, a superset of word n-gram. This has a remarkable advantage, because only transcribed text is needed to obtain the model, and no word dictionary is needed. In this paper, it is shown that the expressions specific to dialogue speech are extracted automatically from the transcriptions of spoken dialogue corpora ...
متن کاملAutomatic generation of phonetic transcriptions for large speech corpora
We describe a method for the automatic production of phonetic transcriptions in large speech corpora. First, we focus on the application of different techniques for the generation of pronunciation variants. Then, we explain the application of a speech recognition system for selecting the acoustically best matching phonetic transcription. The system is evaluated on different test sets selected f...
متن کامل